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Abstract 

For random graphs distributed according to a stochastic block model, we consider 
the inferential task of partioning vertices into blocks using spectral techniques. Spectral 
partioning using the normalized Laplacian and the adjacency matrix have both been 
shown to be consistent as the number of vertices tend to infinity. Importantly, both 
procedures require that the number of blocks and the rank of the communication 
probability matrix are known, even as the rest of the parameters may be unknown. 
In this article, we prove that the (suitably modified) adjacency-spectral partitioning 
procedure, requiring only an upper bound on the rank of the communication probability 
matrix, is consistent. Indeed, this result demonstrates a robustness to model mis- 
specification; an overestimate of the rank may impose a moderate performance penalty, 
but the procedure is still consistent. Furthermore, we extend this procedure to the 
setting where adjacencies may have multiple modalities and we allow for either directed 
or undirected graphs. 
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1 Background and overview 



Our setting is the stochastic block model [121 ES] — a random graph model in which a set 
of n vertices is randomly partitioned into K blocks and then, conditioned on the partition, 
existence of edges between all pairs of vertices are independent Bernoulli trials with param- 
eters determined by the block membership of the pair. (The model details are specified in 



Section 2.1 .) 

The realized partition of the vertices is not observed, nor are the Bernoulli trial param- 
eters known. However, the realized vertex adjacencies (edges) are observed, and the main 
inferential task is to estimate the partition of the vertices, using the realized adjacencies as 
a guide. Such an estimate will be called consistent if and when, in considering a sequence 
of realizations for n = 1, 2, 3, . . . with common model parameters, it happens almost surely 
that the fraction of misassigned vertices converges to zero as n — > oo. 

Rohe et al. [20J proved the consistency of a block estimator that is based on spectral 
partitioning applied to the normalized Laplacian, and Sussman et al. [21] extended this to 
prove the consistency of a block estimator that is based on spectral partitioning applied to 
the adjacency matrix. Importantly, both of these procedures assume that K and the rank of 
M are known (where M e [0, l} KxK is the matrix consisting of the Bernoulli parameters for 
all pairs of blocks), even as the rest of the parameters may be unknown. In this article, we 
prove that the (suitably modified) adjacency-spectral partitioning procedure, requiring only 
an upper bound for rankM, gives consistent block estimation. We demonstrate a robustness 
to mis-specification of rankM; in particular, if a practitioner overestimates the rank of M in 
carrying out adjacency spectral partitioning to estimate the blocks, then the consistency of 
the procedure is not lost. Indeed, this is a model selection result, and we provide estimators 
for K and prove their consistency. 

Our analysis and results are valid for both directed and undirected graphs. We also 
allow for more than one modality of adjacency. For instance, the stochastic block model can 
model a social network in which the vertices are people, and the blocks are different com- 
munities within the network such that probabilities of communication between individual 
people are community dependent, and there is available information about several differ- 
ent modes of communication between the people; e.g. who phoned whom on cell phones, 
who phoned whom on land lines, who sent email to whom, who sent snail mail to whom, 
with a separate adjacency matrix for each modality of communication. Indeed, if there are 



2 



different matrices M for each mode of communication, even if there is dependence in the 
communications between two people across different modalities, our analysis and results will 
hold — provided that every pair of blocks is "probabilistically discernable" within at least one 



mode of communication. (This will be made more precise in Section 2.1 



Latent space models (e.g. Hoff et al. [H]) and, specifically, random dot product models 
(e.g. Young and Scheinerman [26j) give rise to the stochastic block model. Indeed, the 
techniques that we use in this article involve generating latent vectors for a random dot 
product model structure which we then use in our analysis. Nonetheless, our results can 
be used without awareness of such random-dot-product-graph underlying structure, and we 
do not concern ourselves here with estimating latent vectors for the blocks. (In any event, 
latent vectors are not uniquely determinable here). 

Consistent block estimation in stochastic block models has received much attention. For- 
tunato [TU] and Fjallstrom [D] provide reviews of partitioning techniques for graphs in general. 
Consistent partitioning of stochastic block models for two blocks was accomplished by Sni- 
jders and Nowicki [23] in 1997 and for equal-sized blocks by Condon and Karp [7] in 2001. 
For the more general case, Bickel and Chen [1] in 2009 demonstrated a stronger version of 
consistency via maximizing Newman-Girvan modularity [IB] and other modularities. For 
a growing number of blocks, Choi et al. [3] in 2010 proved consistency of likelihood based 
methods. In 2012, Bickel et al. [2] provided a method to consistently estimate the stochastic 
block model parameters using subgraph counts and degree distributions. This work and the 
work of Bickel and Chen pQ both consider the case of very sparse graphs. 

Rohe et al. [20] in 2011 used spectral partitioning on the normalized Laplacian to consis- 
tently estimate a growing number of blocks and they allow the minimum expected degree to 
be at least 0(n/ yTogn). Sussman et al. [21] extended this to prove consistency of spectral 
partitioning directly on the adjacency matrix for directed and undirected graphs. Finally, 
Rohe et al. [21J proved consistency of bi-clustering on a directed version of the Laplacian 
for directed graphs. Unlike modularity and likelihood based methods, these spectral parti- 
tioning methods are computationally fast and easy to implement. Our work extends these 
spectral partitioning results to the situation when the number of blocks and the rank of the 
communication matrix is unknown. We present the situation for fixed parameters, and in 
Section [9] we discuss possible extensions. 

The adjacency matrix has been previously used for block estimation in stochastic block 
models by McSherry [T7], who proposed a randomized algorithm when the number of blocks 
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as well as the block sizes are known. Coja-Oghlan [B] further investigate the methods pro- 
posed in McSherry and extend the work to sparser graphs. This method relies on bounds 
in the operator norm which have also been investigated by Oliveira [19] and Chung et al. 
[5]. In 2012, Chaudhuri et al. [4] used an algorithm similar to the one in McSherry [17] to 
prove consistency for the degree corrected planted partition model, a slight restriction of the 
degree corrected stochastic block model proposed in [15] . Notably, Chaudhuri et al. [I] do 
not assume the number of blocks is known and provide an alternative method to estimate 
the number of blocks. This represents another important line of work for model selection in 
the stochastic block model. 

The organization of the remainder of this article is as follows. In Section [2] we describe 
the stochastic block model, then we describe the inferential task and the adjacency- spectral 
partitioning procedure for the task — when very little is known about the parameters of the 
stochastic block model. In Section [3] ancillary results and bounds are proven, followed in 
Section [4] by a proof of the consistency of our adjacency-spectral partitioning. However, 
through Section [4], there is an extra assumption that the number of blocks K is known. In 
Section [5] we provide a consistent estimator for K, and in Section [6] we prove the consistency 
of an extended adjacency-spectral procedure that does not assume that K is known. Indeed, 
at that point, the only aspect of the model parameters which is still assumed to be known 
is just an upper bound for the rank of the communication probability matrix M. 

Bickel et al. |2] mention the work of Rohe et al. [20] as an important step, and then opine 
that "unfortunately this does not deal with the problem [of] how to pick a block model 
which is a good approximation to the nonparametric model." Taking these words to heart, 
our focus in this article is on showing a robustness in the consistency of spectral partitioning 
in the stochastic block model when using the adjacency matrix. Our focus is on removing 
the need to know a priori the parameters, and to still attain consistency in partitioning. This 
robustness opens the door to explore principled use of spectral techniques even for settings 
where the stochastic block model assumptions do not strictly hold, and we anticipate more 
future progress in consistency results for spectral partitioning in nonparametric models. 

We conclude the article with additional discussion of consistent estimation of K (Sec- 
tion [7]), illustrative simulations (Section [8]), and a brief discussion (Section [9]). 



4 



2 The model, the adjacency-spectral partitioning pro- 
cedure, and its consistency 



2.1 The stochastic block model 

The random graph setting in which we work is the stochastic block model, which has param- 
eters K, p, M where positive integer K is the number of blocks, the block probability vector 
p G (0, 1} K satisfies Y2k=iPk = 1> an d the communication probability matrix M G [0, l\ KxK 
satisfies the model identifiability requirement that, for all p, q G {1, 2, . . . , K} distinct, either 
it holds that M Pj . ^ M q> . (i.e. the pth and gth rows of M are not equal) or M. iP ^ M. i9 
(i.e. the pth and gth columns of M are not equal). The model is defined (and the parameters 
have roles) as follows: 

There are n vertices, labeled 1,2, ... ,n, and they are each randomly assigned to blocks 
labeled 1, 2, . . . , K by a random block membership function r : {1, 2, . . . , n} — > {1, 2, . . . , K} 
such that for each vertex i and block k, independently of the other vertices, the probability 
that r(i) = k is pk- 

Then there is a random adjacency matrix A G {0, 1 j nxn where, for all pairs of vertices 
i,j that are distinct, Aij is 1 or according as there is an i,j edge or not. Conditioned on 
r, the probability of there being an i,j edge is M T uy T r^, independently of the other pairs 
of vertices. Our analysis and results will cover both the undirected setting in which edges 
are unordered pairs (in particular, A and M are symmetric) and also the directed setting in 
which edges are ordered pairs (in particular, A and M are not necessarily symmetric). In 
both settings the diagonals of A are all 0's (i.e. there are no "loops" in the graph). 

We assume that the parameters of the stochastic block model are not known, except 
for one underlying assumption; namely, that a positive integer R is known that satisfies 
rankM < R. (Of course, R may be taken to be rankM or K if either of these happen to be 
known.) However, for now through Section |4j we also assume that K is known; in Section [5] 
we will provide a consistent estimator for K if K is not known, and then in Section [6] we 
utilize this consistent estimator for K to extend all of the previous procedures and results 
to the scenario where K is also not known (and then the only remaining assumption is our 
one underlying assumption that a positive integer R is known such that rankM < R). 

Although the realized adjacency matrix A is observed, the block membership function r is 
not observed and, indeed, the inferential task here is to estimate r. In Section 2.2[ adjacency- 
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spectral partitioning is used to obtain a block assignment function f : {1,2, ... ,n} — > 
{1,2,...,K} that serves as an estimator for r, up to permutation of the block labels 
1, 2, . . . , K on the K blocks. Then Theorem [T] in Section |2.3| asserts that almost always the 
number of misassignments min bijcctions ^{i l2 ,...,je}->{i,2,...,JiQ I { J'' = 1 , 2 , . . . , rz : r(j) 7^ vr(f(j))}| 
is negligible. 

A more complicated scenario is where there are multiple "modalities of communica- 
tion" for the vertices. Specifically, instead of one probability communication matrix, there 
are several probability communication matrices M^ 1 ', M^ 2 \ . . . , M^ s ' G [0,l] KxK which 
are all parameters of the model, and there are corresponding random adjacency matrices 
v4 (1) , A {2 \ . . . , A (5) G {0,l} nxn such that for each modality s = 1,2,..., S and for each 

(s) 

pair of vertices i,j that are distinct, Aij is 1 with probability MV^ r ^ independently of 
the other pairs of vertices but possibly with dependence across the modalities. As above, 
for model identifiability purposes we assume that, for each p,q G {1,2, . . . , K} distinct, 
there exists an s G {1,2,...,S} such that M$ ^ Mq) or m[ s J ^ M.jJ. Also, it is 
assumed that we know positive integers R^ 2 \ . . . , which are upper bounds on 



rankM^, rankM^ 2 \ . . . , rankM^ respectively. We will also describe next in Section 2J2_ 
how the adjacency-spectral partitioning procedure of that section can be modified for this 
more complicated scenario so that Theorem [T] will still hold for it. 

2.2 The adjacency-spectral partitioning procedure 

The adjacency- spectral partitioning procedure that we work with is given as follows: 

First, take the realized adjacency matrix A, and compute a singular value decomposi- 
tion A = [U\U r ](E © E r )[V\V r } T where U,V e R nxR , U r ,V r G W nx ( n ~ R \ £ G R RxR , and 
S r G MS n ~ R > x ( n ~ R > are such that [?7|Z7 r ] and [V|V^.] are each real-orthogonal matrices, and 
E©E r is a diagonal matrix with its diagonals non-increasingly ordered o~\ > 02 > 03 . . . > a n . 
Let \/Tj G W RxR denote the diagonal matrix whose diagonals are the nonnegative square roots 
of the respective diagonals of S, and then compute X := UyT, and Y := VyTi. 

Then, cluster the rows of X or Y or [A|F] into at most K clusters using the minimum 
least squares criterion, as follows: If it is known that the rows of M are pairwise not equal, 
then compute C G IR nxi? which minimizes \\C — X\\p over all matrices C G IR"^ such that 
there are at most K distinct-valued rows in C , otherwise, if it is known that the columns of M 
are pairwise not equal, then compute C G M rixii; which minimizes ||C — Y\\p over all matrices 
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C G R nxR such that there are at most K distinct-valued rows in C, otherwise compute 
C G ]R nx2iJ which minimizes ||C— [Jf|Y]||i? over all matrices C G ]R rax2fi such that there are 
at most K distinct-valued rows in C . (Although our analysis will assume the use of this 
minimum least squares criterion, note that popular clustering algorithms such as .ff -means 
will also (empirically) produce good results for our inferential task of block assignment.) 

The clusters obtained are estimates for the true blocks; i.e. define the block assign- 
ment function f : {1,2, ... ,n} — > {1,2,..., K} such that the inverse images {f _1 (i) : i = 
1,2,... K} partition the rows of C (by index) so that rows in each part are equal-valued. 
This concludes the procedure. 

In the more complicated scenario of multiple modalities of communication, carry out the 
above procedure in the same way, mutatis mutandis: For each modality s, compute the sin- 
gular value decomposition = [U^lU^]^ © Z { r s) )[V^\V r {s) ] T for U^ S \V^ G R nxRis \ 
U { r s) M s) G W x{ - n - R{s) \ £ G RK (s) x* (s \ and E r G ]R(«-« (s) )x(«-« (s) ) suc h that [U^\U^} and 
are each real-orthogonal matrices and © £r is a diagonal matrix with its 
diagonals non-increasing ly ordered, then define := U^V^ and := yWy^W 

and then, according as the rows of all are known to be distinct-valued, the columns 
of are known to be distinct- valued, or neither, compute C which minimizes \\C — 

[ X W\X^\ ■ ■ ■ |X( 5 )]|| F or \\C-[YW\Y^\ ■ ■ ■ \Y^]\\ F or \\C-[X^\X^\ ■ ■ ■ \X^\Y^\Y^\ ■ ■ ■ \Y^]\\ F 
such that there are at most K distinct-valued rows in C, and then define r as the partition 
of the vertices into K blocks according to equal- valued corresponding rows in C. 



2.3 Consistency of the adjacency-spectral partitioning of Section 

E2I 



We consider a sequence of realizations of the stochastic block model given in Section |2.1| for 
successive values n — 1, 2, 3, . . . with all stochastic block model parameters being fixed. In 
this article, an event will be said to hold almost always if almost surely the event occurs 
for all but a finite number of n. The following consistency result asserts that the number 



of misassignments in the adjacency-spectral procedure of Section 2 J 2_ is negligible; it will be 
proven in Section |4j 



Theorem 1. With the adjacency-spectral partitioning procedure of Section 2.2, for any fixed 
e>\, the number of misassignments min bijections n :{i,2,...,K}-+{i,2,...,K} \{j = 1, 2, . . . , n : r(j) ^ 
7r(f(j))}| is almost always less than n € . 
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Theorem [I] holds for all of the scenarios we described in Section 2.1 whether the edges are 
directed or undirected, whether there is one modality of communication or multiple modali- 
ties. It also doesn't matter if for each successive n the partition function and adjacencies are 
re-realized for all vertices or if instead they are carried over from previous n's realization with 
just one new vertex randomly assigned to a block and just this vertex's adjacencies to the 
previous vertices being newly realized. (Note that if the partition function and adjacencies 
are re-realized for all vertices for successive n then when we invoke the Strong Law of Large 
Numbers we will be using the version of the Law in [H].) 

In Sussman et al. [21], it was shown that if R = rankM then the number of misassignments 



of the adjacency spectral procedure in Section [272] is almost always less than a constant times 
logn (where the constant is a function of the model parameters). Indeed, both logn and n e , 
when divided by the number of vertices n, converge to zero, and in that sense we can now 
say that whether rankM is known or if it is overestimated then either way the number of 
misassignments of spectral-adjacency partitioning is negligible. This is a useful robustness 
result. 



3 Ancillary results 

3.1 Latent vectors and constants from the model parameters 

In this section we identify relevant constants a, f3, and 7 which depend on the specific values 
of the stochastic block model parameters; these constants will be used in our analysis. We 
also consider a particular decomposition of a model parameter (the communication proba- 
bility matrix M) into latent vectors which we may then usefully associate with the respective 
blocks. 

We first emphasize that knowing the values of these constants a, (3, and 7 which we are 
about to identify and knowing the values of the latent vectors which we are about to define 
are not at all needed to actually perform the adjacency- spectral clustering procedure of 



Section 2.2, nor is any such knowledge needed in order to invoke and use the consistency 
result Theorem [TJ These constants and latent vectors will be used here in developing the 
analysis and then proving Theorem [TJ 

The stochastic block model parameters are K, p, M; the constants a, (3, 7 are defined 
as follows: Recall that p k > for all k\ choose constant a > such that a < p k for all k. 
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Next, choose matrices v £ R XxRankM suc h that M = /iz/ T ; indeed, such matrices fx and 1/ 
(exist and) can be easily computed using a singular value decomposition of M. It is trivial 
to see that if any two rows of M are not equal-valued then those two corresponding rows of 
\i must be not equal-valued, and if any two columns of M are not equal-valued then those 
two corresponding rows of v are not equal- valued. Choose constant p > be such that, for 
all pairs of nonequal-valued rows fXk',. of ji it holds that \\fJ>k,- ~ Mfc'.-lh > P, and for all 
pairs of nonequal-valued rows Vk,-, Vk 1 ,- of v it holds that \\uk,. — ffcvlh > P- Lastly, since \i 
and v are full column rank, choose constant 7 > such that the eigenvalues of /z T /x and v T v 
are all greater than 7. 

The rows of \x and v are respectively called left latent vectors and right latent vectors, and 
are associated with the vertices as follows. The matrices X £ ^nx^nkM an( j y e j^nx rankM 
are defined such that for all i = 1, 2, . . . , n, X it . := ii T u\. and := v T u\.. The significance 
of the latent vectors is that for any pair of distinct vertices i and j the probability of an i,j 
edge is the inner product of the left latent vector associated with % (which is X^.) with the 
right latent vector associated with j (which is 3^-,.). Of course, these latent vectors are not 
observed; indeed, M is not known and r is not observed. 

Finally, let Xy T = UAV T be a singular value decomposition, i.e. U, V £ R" xrankM each 
have orthonormal columns and A £ ]^ rankMxrankM j g a diagonal matrix with diagonals ordered 
in nonincreasing order ^ > q 2 > <?3 > • • • > ^rankM- It is useful to observe that X(y T VA^ 1 ) = 
U and (I\r 1 U T X)y T = V T imply that rows of X which are equal-valued correspond to rows 
of U that are equal-valued, and rows of y which are equal-valued correspond to rows of V 
that are equal-valued. 

In the more complicated scenario of more than one communication modality these defi- 
nitions are made in the same way, mutatis mutandis: For all modalities s, choose 

K KxRankM( s > guch M (s) = ^(s) v (s) T ^ then c h 00se £ > Q suc h t h at fo r every mo dality 

s and all pairs of nonequal-valued rows fj,^}, (/$. of ^ it holds that \\/j, k ^ ~~ ^y- II2 > Pi 
and for all pairs of nonequal-valued rows of it holds that — v$.\[2 > P- 

Choose constant 7 > such that all eigenvalues of and for all modalities 

s are greater than 7. Then, for each modality s, define the rows of X^ £ j£ nxRankM(s) 
and y^ £ R nxRankM(s) to be the rows from /j,^ and v^ s \ respectively, corresponding to the 
blocks of the respective vertices, and then define U^ s \ V^ s \ and (with ordered diagonals 
q[ s \ ffl, ■ ■ ■ ^ r ankM( s )) ^° f° rm singular value decompositions X^y^ T = A^V^ T . 
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3.2 Bounds 

In this section we prove a number of bounds involving A, Xy T , their singular values and 
matrices constructed from components of their singular value decompositions. These bounds 
will then be used in Section [4] to prove Theorem [TJ which asserts the consistency of the 



adjacency-spectral partitioning procedure of Section 2.2 



The results in this section are stated and proved for both the directed setting and the 



undirected setting of Section 2T However, we directly treat only the setting with one 
modality of communication; if there are multiple modalities of communication then all of 
the statements and proofs in this section apply to each modality separately. Some of the 
results in this section can be found in similar or different form in [21] ; we include all necessary 
results for completeness, and in order to incorporate many substantive changes needed for 
treatment of this article's focus. 

Lemma 2. It almost always holds that \\AA T — Xy T (Xy T ) T \\ F < \/?>n s / 2 yf\ogn and it 
almost always holds that \\A T A - (Xy T ) T Xy T \\ F < V3n 3 / 2 y/Togn. 

Proof: Let X it . and 3^,- denote the ith rows of X and y, respectively. For all i ^ j, 

[AA T ]ij - xy' (xy' )' :/ = ^(AnAfl - x.y'A^y, 1 .) - x^ylx^yr - x,..y[x,..yl. (i) 

Hoeffding's inequality states that if T is the sum of m independent random variables that take 
values in the interval [0, 1], and if c > then P [(T — E[T]) 2 > c] < 2e _ ™. Thus, for all i,j 
such that % 7^ j, if we condition on X and y, we have for I i,j that the m := n — 2 random 
variables AuAji have distribution Beraovll^X^.y^Xj .y^) and are independent. Thus, taking 
c = 2(n — 2) logn in Equation ([T]), we obtain that 

P [{{AA T ] i3 - [Xy T (Xy T f]ij) 2 > 2(n - 2) hgn + An - 4] < A ( 2 ) 

Integrating Equation ^ over X and y yields that Equation ^ is true unconditionally. By 
probability subadditivity, summing over i, j such that i j in Equation ([2]), we obtain that 



P 



([AA T ] ij - [Xy T {Xy T ) T } ij f > 2n{n - l)(n - 2) logn + 4n(n - 1) 



< (3) 



By the Borel-Cantelli Lemma (which states that if a sequence of events have probabilities 
with bounded sum then almost always the events do not occur) we obtain from Equation 
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(]3| that almost always 

J2 ([AA T ] V - [Xy T (Xy T )%) 2 < InHogn 

and thus almost always ||Av4 T — Xy T (Xy T ) T \\ 2 F < 3n 3 logn because each of the diagonals 
of AA T — Xy T (Xy T ) T are bounded in absolute value by n. The very same argument holds 
mutatis mutandis for \\A T A - (Xy T ) T Xy T \\ 2 F . □ 

The next lemma, Lemma [3j provides bounds on the singular values <Ji, £3, • • • of matrix 
Xy T and then, in Corollary |4j we obtain bounds on the singular values 01,02,03,... of 
matrix A. Recall that the rank of Xy T is (almost always) rankM, while A may in fact have 
rank n. 

Lemma 3. It almost always holds that ajn < ^ m nkM; and it always holds that <ri < n. 

Proof: Because Xy T is in [0, l] nxn ; the nonnegative matrix Xy T (Xy T ) T has all of its 
entries bounded by n, thus all of its row sums bounded by n 2 , and thus its spectral radius 

is bounded by n 2 , ie we have ^ < n as desired. 
Next, for all k — 1, 2, . . . , K, let random variable denote the number of vertices in block 
k. The nonzero eigenvalues of (Xy T )(Xy T ) T = Xy T yX T are the same as the nonzero 
eigenvalues of y T yX T X . By the definition of a and the Law of Large Numbers, almost 
always n k > an for each k, thus we express X T X = J2k=i n k[^.^k,- = a;n/i T // + J2k=i( n k ~ 
an)/i^./ifc 5 . as the sum of two positive semidefinite matrices and obtain that the minimum 
eigenvalue of X T X is at least a^n. Similarly the minimum eigenvalue of y T y is at least 
ajn. The minimum eigenvalue of a product of positive semidefinite matrices is at least 
the product of their minimum eigenvalues [27], thus the minimum eigenvalue of y T yX T X 
(which is equal to ? r 2 ankM ) is at least a-yn ■ ajn, as desired. □ 

Corollary 4. It almost always holds that ajn < cr ran kM; it always holds that o\ < n, and it 
almost always holds that ran kM+i < 3 1//4 n 3//4 log 1//4 n. 

Proof: By Lemma|2]and Weyl's Lemma (e.g., see [13J), we obtain that for all m it almost al- 
ways holds that \a 2 m - < \\AA T - Xy T (Xy T ) T \\ F < ^n 3 / 2 ^fogn. For all m > rankM, 
the mth singular value of Xy T is zero, thus almost always cr ra nkM+i < 3 1 ^ 4 n 3 ^ 4 log 1//4 n. 
Lemma [3] can in fact be strengthened to show that there is an 5 > such that almost 
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always (ary + 5)n < <j ran kM, hence (cry + 5) 2 n 2 < <? r 2 ankM , thus we have almost always that 
(017) 2 n 2 < er 2 ankM , as desired. Showing that o~\ < n is done the same way that q\ < n was 
shown in Lemma |U □ 

It is worth noting that a consequence of Corollary [4] is that, for any chosen real number 
co such that | < u < 1, the random variable which counts the number of ai, a 2 , ■ ■ ■ , o n which 
are greater than is a consistent estimator for rankM (is almost always equal to rankM). 
Our goal in this article is to show a robustness result, that "overestimating" rankM with R in 
the adjacency-spectral partitioning procedure does not ruin the consistency of the procedure. 



Recall from Section 



2.2 



the singular value decomposition A = [{7|{7 r ](£ © £ r )[V|K] T - At 
this point it will useful to further partition U = [Ue\U c ], V = [Vg \ V c ], and £ = £^ © £ c where 

U h Vt G ]R« xrarlkM ? JJ C , V c G M. nx( < R -™ nkM \ S £ G RrankMx rankM ? &nd ^ £ M (i?-rankftf) X (fl-rankAf) _ 

(The subscripts £,c,r are mnemonics for "left", "center", and "right", respectively.) Also 



UtJ%, Y, := V,- 



'Jug, X c 



U C VX C , Y c := V c 



S c , X r 



define the matrices Xi 

and Y r := V T ^/T^.. Referring back to the definition of X and Y in Section 2J2 note that 
X = [Xt\X c ] and Y = [Y e \Y c ]. 



From the definition of (3 in Section 3.1 if follows that for any i and j such that Xi t . 7^ Xj t . 
(or y it . 7^ 3^>) it holds that \\Xi t . — Xj t .\\ > (3 (respectively, ||3^v — 3^,.|| > )• The next 
result shows how this separation extends to the rows of the singular vectors of Xy T . 

Lemma 5. Almost always the following are true: 



Xj,\\ 2 > (3, it holds that \\Ui, - W jV || 2 > & 
U.|| 2 > 0, it holds that IIV*. - V 7 -.|| 2 > /3 



a 7 
n 
arf 
n 



For all i,j such that \\Xi t . 
For all i,j such that ||3^,. 
For all i,j such that \\X it . — Xj t .\\ 2 > (3, it holds that \\Ui t .Qy/T^ — Uj t .Q\/T^\\2 > 0^7 for any 
orthogonal matrix Q G W ankMxiankM . 

For alli,j such that ||J^ ( . — 3^j,- 1 1 2 > (3, it holds that \\ V^.Qs/T^ — Vj t .Qy/T^\\2 > af3j for any 
orthogonal matrix Q G M IaakMxiankM . 



Proof: Recall the singular value decomposition Xy T = UAV T from Section 3.1 (where 
W,V 6 ^nxrankM jj^g or thonormal columns and A G K rankMxrankM j s diagonal). Let 
y T y = WA 2 W T be a spectral decomposition; that is, W G ]R rankMxrankM i s orthogonal and 
A G R rankMxrankM i s a diagonal matrix with positive diagonal entries. Note that 



(XWA)(XWAy = XWA^W 1 X 1 



xy T yx T 



UAV'VAW = (WA)(WA)\ 



(4) 
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For any i,j distinct, let e6l™ denote the vector with all zeros except for the value 1 in the 
ith coordinate and the value —1 in the jth coordinate. By Equation Q, we thus have that 

\\(XWA) it . - (XWA) jt .\\l = e T (XWA)(XWA) T e = e T (UA)(UA) T e = \\(UA) it . - {UA)j,\\ 2 2 . (5) 

From Lemma [3] and its proof, we have that the diagonals of A are almost always at least 
■yjarfn and that the diagonals of A are at most n. Using this and Equation (j5|, we get that 
if i,j are such that \\X it . — Xj || > /3 then it holds that 

P < \\Xi, ~ Xj,-h = \\(XW)i, - (XW)j,\\ 2 < -J=\\(XWA) h . - (XWA) jt .\\ 2 

1 :||(WA)i, - (WA) it .|| 2 < -~L=n\\Ui, -U h .\\ 2 . 



Thus \\U it . -Uj,\\ 2 > Py/¥, as desired. Now, if Q G W ankMxiaakM is any orthogonal matrix 
then, by Corollary [4], 

\\Ui, -U h .\\ 2 = \\U h .Q-U h .Qh < -J=\\U h .Q^-U h .Q^\\ 2 



which, together with \\Ui t . — Uj t .\\ 2 > /^VrT' hnplies \\^i,-QV^£ ~ Mj,-QV^~e\\2 > a Pli as 
desired. The same argument applies mutatis mutandis for \\Y^. — Y^. || > (3. □ 

In the following, the sum of vector subspaces will refer to the subspace consisting of all 
sums of vectors from the summand subspaces; equivalently, it will be the smallest subspace 
containing all of the summand subspaces. The following theorem is due to Davis and Kahan 
[8] in the form presented in [20] . 

Theorem 6. (Davis and Kahan) Let H, H' G IR" xn be symmetric, suppose S C K is an 
interval, and suppose for some positive integer d that W, W G M. nxd are such that the 
columns of W form an orthonormal basis for the sum of the eigenspaces of H associated 
with the eigenvalues of H in S, and the columns of W form an orthonormal basis for the 
sum of the eigenspaces of H' associated with the eigenvalues of H' in S. Let 5 be the minimum 
distance between any eigenvalue of H in S and any eigenvalue of H not in S. Then there 
exists an orthogonal matrix Q G R dxd such that \\WQ — W\\f < s\\H — H'\\ F . 

Corollary 7. There almost always exist real orthogonal matrices Qu, Qv £ jjrankM xrankM 
which satisfy \\UQ U - U t \\ F < ^ ■ ^J^^ and \\VQ V - V t \\ F < ■ yf^. Furthermore, 
it holds that \\X~i — Xi\\ F < ■ A/log n and ||3^ — Y^p < • \/log n, where we define 
X e := UQ uy fE~ e and y e := VQvV^e- 
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Proof: Take S in Theorem [6] to be the interval (|a 2 7 2 n 2 , oo). By Lemma [3] and Corol- 
lary |4| we have almost always that precisely the greatest rankM eigenvalues of each of 
H := Xy T (Xy T ) T and H' := AA T are in S. By Lemma [3} almost always 5 > a 2 ^ 2 n 2 (for 
the 5 in Theorem 6) so, by Lemma |2j almost always ^\\H — H'\\p < a2 ^f n 2 V%n i / 2 yJ\og n. 
With this, the first statements of Corollary [7] follow from the Davis and Kahan Theorem 
(Theorem [6]). The last statements of Corollary [7] follow from postmultiplying UQu — Ue with 
a/S^ and then using Corollary Id and the definition of Xg. □ 



Now, choose U c G E nx ( fl - rankJvf ) and U r G M nx ( n - R ) such that [U\U c \U r ] G 
orthogonal matrix. In particular, note that the columns of U c together with the columns of 
IA T form an orthonormal basis for the eigenspace associated with eigenvalue in the matrix 

h : = xy T (xy T ) T . 

Corollary 8. There almost always exists a real orthogonal matrix Q G R( rt - rankM ) x ( n - rankM ) 
such that || [U c \Ur] Q - [U c \U r ] \\ F < ^ ■ \f^- Define X c G RnxiR-^nkM) and £ e 
R nx(n-R) such that [x c \Xr] := [U c \U r ]QV^c © S r - r/ien || [^ c |<Y r ] - [X c |X r ] || F < • 



-1/8 log 5 / 8 n. 



Proof: The first statement of Corollary [8] is proven in the exact manner that we proved 
Corollary[7j except that S is instead taken to be the complement of (^a 2 7 2 n 2 , 00). The second 
statement of Corollary [8] follows by postmultiplying [U c \U r ]Q — [U c \U r ] with y/Y^®T^ and 
then using Corollary [4] and the definitions of X c and X r . □ 

Note 9. Almost always it holds that \\X c \\f < VR - rankM 3 1/8 n 3/8 log 1/8 n. 



Proof: It is clear (with the matrix Q from Corollary |8J) that [U c \U r ] Q has orthonormal 
columns, hence the Froebenius norm of the first R — rankM columns is exactly y/R — rankM. 
The result follows from postmultiplying these columns by y/T^ c and using Corollary 51 □ 



4 Proof of Theorem [T], consistency of the adjacency- 



spectral procedure of Section 2.2 



In this section we prove Theorem[TJ Assuming that the number of blocks K is known and that 
an upper bound R is known for rankM, Theorem [T] states that, for the adjacency- spectral 
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procedure described in Section 2.2 and for any fixed real number e > |, the number of 
misassignments min b ij e ction S ir-.{i,2,...,K}->{i,2,...,K} \{j = 1,2, ...,n : r(j) ^ 7r(f(j))}| is almost 
always less than n e . We focus first on the scenario where there is a single modality of 
communication, and we also suppose for now that it is known that the rows of M are 
pairwise nonequal. 



First, an observation: Recall from Section |3.1| that, for each vertex, the block that the 



vertex is a member of via the block membership function r is characterized by which of 
the K distinct-valued rows of U the vertex is associated with in U. In Corollary [7j we 
defined X^ := UQu\fi^t- Because X^ is U times an invertible matrix (since a/S^ is almost 
always invertible by Corollary [4]) , the block that the vertex is truly a member of is thus 
characterized by which of the K distinct-valued rows of Xt the vertex is associated with in 
X(_. Also recall that the block which the vertex is assigned to by the block assignment 
function f is characterized by which of the at-most-.fr distinct-valued rows of C the vertex is 
associated with in C — where C G IR nxi? was defined as the matrix which minimized \\C — X\\ F 
over all matrices C G IR nxiJ such that there are at most K distinct-valued rows in C. 

Denote by o nx ( i? ~ rankM ) the matrix of zeros in ^ix(R-raakM) _ We next show the following: 

For any fixed £ > jj, almost always it holds that \\C - [^|o« x (^- rankM )] \\ F < n *. (6) 

Indeed, by the definition of C, the fact that [^|0™ x ( R ~ rankM )] has K distinct- valued rows, 
and the triangle inequality, we have that 

\\C-X\\ F < (I |^|O nx(R - rankM) ] -X|| F < ||[^|^c] -X\\ F + \\X C \\ F . (7) 

Then, by two uses of the triangle inequality and then Equation 0, we have 

\\C-[X e \O nx{R - rankM) }\\ F < \\C-[X e \X c ] \\ F + \\X C \\ F 

< \\C-X\\ F +\\X-[X £ \X C ]\\ F +\\X C \\ F 

< 2 • ||[A^|Ay — X\\ F + 2 • ||Af c ||i? 

which, by Corollary [7j Corollary |8j and Note |9j is almost always bounded by 

1/2 

+ 2R 1 ' 2 Z l /W\og 1 ' 



a — \ fs 1 /^ 2 1/8l 5/8 v 

• v/log^ J + ( a2 2 • n" 1/8 log 5/8 n) 



a 2r y 2 



n, 



which is almost always bounded by rfi for any fixed £ > §■ Thus Line ([6| is shown. 
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Now, it easily follows from Line ([6]) that 

For any fixed e > -, the number of rows of C - [^|o™ x ( fi - rankM )] 

<y.f3'~) 

with Euclidean norm at least —— is almost always less than n € ; (8) 

indeed, if this was not true, then \\C - [^|0 nx(iJ - rankM) ]|| F > Jn e would contradict 

Line [6] 

Lastly, form balls Bx, B 2 , . . . , Br of radius about the K distinct- valued rows of 
[^|0 nx( - ii:_rankM ^; by Lemma[5j these balls are almost always disjoint. The number of vertices 
which the block membership function r assigns to each block is almost always at least an, 
thus (by Line (|8]) and the Pigeonhole Principle) almost always each ball Bx, B%, . . . , Bk 
contains exactly one of the K distinct- valued rows of C. And, for any fixed e > |, the 
number of misassignments from f is thus almost always less than n e . Theorem [T] is now 
proven in the scenario where there is a single modality of communication and it is known 
that the rows of M are pairwise nonequal. 

In the general case where there are multiple modalities of communication and/or the 
rows of M are not known to be pairwise nonequal, then the above proof holds mutatis mu- 
tandis (affecting relevant bounds by at most a constant factor); in place of X use Y or [X|Y] 
or [X«|X( 2 )| • • • \X^} or [Y^\Y^\ ■ ■ ■ \Y^} or [X^\X^\ ■ ■ ■ \X^\Y «|Y( 2 )| • • • \Y^\ and 
in place of [X t \X e ] use [%\y^ or [X e \X c \yi\y c } or [xj 1 ] \xP\- ■ ■ \xf ] \X^ S \ or 

[W\yP\y?\yP\ ■ ■ ■ \WW\ or \x^\xP\xp\xP\ . . . i^i^i^i^i^fi^i • • • \y?WX 

as appropriate, and similar kinds of adjustments. 

5 Consistent estimation for the number of blocks K 

In this section we provide a consistent estimator K for the number of blocks K, if indeed K 
is not known. (The only assumption used is our basic underlying assumption that an upper 
bound R is known for rankM.) 

To simplify the notation, in this section we assume that there is only one modality of 
communication and we also assume that it is known that the rows of M are distinct-valued. 
These simplifying assumptions do not affect the results we obtain, and the analysis can be 
easily generalized to the general case in the same manner as was done at the end of Section |4j 
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In the adjacency-spectral partitioning procedure from Section 2.2 recall that one of the 
steps was to compute C £ W lxR which minimized \\C— X\\p over all matrices C £ ]R nx - R such 
that there are at most K distinct- valued rows in C. Then the block assignment function f was 
defined as partitioning the vertices into K blocks according to equal-valued corresponding 



rows in C. Let us now generalize the procedure of Section 2.2 Suppose that, for any fixed 
positive integer K', we instead compute C £ IR nx ' R which minimizes \\C — X\\p over all 
matrices C £ M. nxR such that there are at most K' distinct- valued rows in C. Then the block 
assignment function f is defined as partitioning the vertices into K' parts (some possibly 
empty) according to equal-valued corresponding rows in C. We shall call this adjusted 



procedure "the adjacency-spectral partitioning procedure from Section 2.2 with K' parts." 



Theorem 10. Let real number^ such that | < £ < \ be chosen and fixed. For the adjacency- 



spectral procedure from Section 2.2 with K' parts, if K' = K then almost always \\C — X\\p < 



n^, and if K' < K then almost always \\C — X\\p > . 

Proof: Using Equation ([7]), Corollary [TJ Corollary |8j and Note [9] in the manner used to 
prove Line (J6J), we obtain that almost always || 1 x (^— rankA,/ ) ] _ X\\p < n^, and that if 
K' — K then almost always ||C — X^p < n^. 

However, if K' < K then, as we did in Section |ZJ consider balls Bi,B 2 , . . . , B K of radius ^ L 



about the K distinct-valued rows of [^ £ |0 nx( ' R - rankM) ]. By Lemma § these balls are almost 
always disjoint and, in fact, their centers are almost always at least a/3^j distance one from 
the other. By the pigeonhole principle, there is at least one ball that contains none of the 
K' distinct- valued rows of C. Together with the fact that each block almost always has more 



than an vertices, we obtain almost always that \\C — [Xe\O nx( - R rankM )]|| F > J an (^f 2 ) . 
Thus, almost always \\C - X\\ > \\C - [x e \0 n ><( R -™ nkM )] \\ F - || [A' £ |0 nx ^- rankM )] - > n € . 



□ 



Let real number £ such that | < £ < \ be chosen and fixed. Define the random variable 
K to be the least positive integer K' such that for the adjacency-spectral procedure from 



Section 2.2 with K' parts it happens that \\C — X\\p < n>. By Theorem 10, we have the 



following consistency result for K. 
Theorem 11. Almost always K = K . 
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6 The extended adjacency-spectral partitioning proce- 
dure 



The adjacency-spectral partitioning procedure of Section 2.2 assumed that an integer R was 



known such that R > rankM, but it also assumed that the number of blocks K was known. 



We next extend the adjacency-spectral partitioning procedure of Section 2.2 (we call it "the 
extended adjacency-spectral partitioning procedure") so that it only has the assumption that 
an integer R is known such that R > rankM, and it is not assumed that K is known. The 
procedure is as follows: 

Let real number £ such that | < £ < \ be chosen and fixed. Successively for K' = 



1, 2, 3 . . ., do the spectral partitioning procedure of Section 2.2 with K' parts until it happens 
that \\C — X\\p < n5, then return the f from the last successive iteration (i.e. the iteration 
where K' = K). 

Theorem 12. With the extended adjacency-spectral partitioning procedure, for any fixed 
e>\, the number of mis assignments min bijections n: {i j 2,...,K}^-{i,2,..„K} \{j = 1,2, . . . ,n : r(j) ^ 
tt (f is almost always less than n e . 



Proof: Indeed, almost always the last value of K' (which is K) is equal to K by Theorem 11 
and then almost always the number of misassignments is less than n e by Theorem [T] □ 

7 Another consistent estimator for K 

In Section[5]we provided the consistent estimator K for the number of blocks K. It was based 



on Theorem 10, which contrasted — for the adjacency-spectral procedure from Section 2.2 



with K' parts — what would happen when K' — K versus when K' < K. In this section 



we are interested in contrasting — for the adjacency-spectral procedure from Section [272] with 
K' parts — what would happen when K' — K versus when K' > K. This yields another 
consistent estimator for K. 



For the adjacency-spectral procedure from Section 2^ with K' parts, the at-most K' 
distinct- valued rows of C will be called the centroids, the centroid separation will refer to the 
minimum Euclidean distance between all pairs of distinct centroids, and the minimum part 
size will refer to the least cardinality of the K' parts as partitioned by f; in particular, if one 
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of the parts is empty then the minimum part size is zero, whereas the centroid separation 
would still be positive. 



Theorem 13. For the adjacency- spectral procedure from Section 2.2 with K' parts, if K' = K 



then almost always the minimum part size is greater than an and the centroid separation is 
at least ^p. Let C > and $ > be any fixed real numbers. If K' > K then almost always 
it will not hold that the minimum part size is greater than dn and the centroid separation is 
at least (. 

Proof: As we did in Section [ZJ consider balls Bi, B 2 , . . . , Bk of radius about the K 
distinct- valued rows of [^|0 nx ( fl_ra,nkM )]. By Lemma [5J these balls are almost always disjoint 
and, in fact, their centers are almost always at least af3 r y distance one from the other. If 
K = K' then recall from Section[4]that almost always each ball contains exactly one centroid. 
By the ctfi^f separation between the balls' centers, we thus have almost always that the 
centroid separation is at least ^|^. Also, by Theorem [l] there is almost always a strictly 
sublinear number of misassignments, hence almost always the minimum part size is greater 
than an. 

Now to the case of K' > K. Suppose by way of contradiction that the minimum 
part size is greater than dn and the centroid separation is at least £. Since there are 
strictly more centroids than balls Bi, B 2 , . . . , Bk, and because of the ( separation be- 
tween the centroids, by the pigeonhole principle there is at least one centroid with dis- 
tance greater than | from each row of [A^|0 nx ( fl_rankM )] (these rows are the centers of the 
balls). Since this centroid appears as a row of C more than dn times, this would imply 



that \\C — [Xi\§ nx ( R rankM )]|| F > Wi?n(|) . However we have by the triangle inequality, 



the definition of C, and the first few line of the proof of Theorem 10 that almost always 



\c - [^|o nx ( R - rankM )]|| F < \\c - x\\ F + \\x - [^|o" x ( H - rankM )]|| F < \\c K - x\\ F + \\x 



^| nx(i?-rankM)] y < 2n ? < J $ n (0 (where f such that I < ^ < \ is fixed and C K denotes 



what C would have been if we instead did the adjacency-spectral procedure from Section [2^2 
with K parts instead of K' parts), which gives us the desired contradiction. □ 



With Theorem 13 we obtain another consistent estimator for K. However, we would need 



to assume that positive real numbers ( and $ are known that satisfy $ < a and ( < 
Assuming that such ( and $ are indeed known, we can define the random variable K to 
be the greatest positive integer K' among the values 1,2,3,... |_§J (note that h is an upper 
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bound on K) such that for the adjacency-spectral procedure from Section 2.2 with K' parts 
the minimum part size is greater than dn and the centroid separation is at least £. By 
Theorem 13 we immediately obtain the following consistency result for K. 

Theorem 14. Almost always K = K . 

In order to define K, lower bounds on and a need to be known in addition to an upper 
bound on rankM that needs to be known. This contrasts with K, for which we only need to 
assume that an upper bound on rankM is known. (Because K requires fewer assumptions, 
the extended adjacency-spectral partitioning procedure in Section ^ utilizes K and not K.) 



Nonetheless, it is useful to be aware of how the adjacency-spectral procedure from Section 2.2 



with K' parts changes in behavior when K' becomes greater than K — besides how it changes 
in behavior when K' becomes less than K. And when lower bounds on ^jp and a are also 
known then, in practice for a single value of n, we can check for K = K in order to have 
more confidence that their common value is indeed K. 



8 A simulated example and discussion 

As an illustration, consider the stochastic block model with parameters 





.3 




K = 3, p = 


.3 


M = 




.4 





.205 .045 
.045 .205 
.150 .150 



.150 
.150 
.180 



(9) 



(in particular, there is only one modality of communication) and suppose edges are undi- 
rected. Here rankM = 2, 

For each of the values R — 1,2, 3, 10, 25 and for each number of vertices n = 100, 200, 300, . . 
we generated 2500 Monte Carlo replications of this stochastic block model and to each of these 
2500 realizations we applied the adjacency-spectral partitioning procedure of Section 2.2 us- 
ing R as the upper bound on rankM (which, in the case of R = 1, is purposely incorrect 
for illustration purposes) assuming that we know K = 3. Note that rather than finding 
the actual minimum of \\C — X\\p, we use the K- means algorithm which approximates this 
minimum. The five curves in Figure [T] correspond to R — 1, 2, 3, 10, 25 respectively, and they 
plot the mean fraction of misassignments (the number of misassigned vertices divided by the 
total number of vertices n, such fractions averaged over the 2500 Monte Carlo replicates) 
along the y-axis, against the value of n along the x-axis. 
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Figure 1: The mean misassignment fraction plotted against n, for each of R — 1, 2, 3, 10, 25. 



Note that when R = 2 the performance of the adjacency-spectral partitioning is excellent 
(in fact, the number of misassignments becomes effectively zero as n gets to 1600). Indeed, 
even when R = 10 and R = 25 (which is substantially greater than rankM = 2) the 
adjacency-spectral partitioning partitioning performs very well. However, when R — 1, 
which is not an upper bound on rankM (violating our one assumption in this article), the 
misassignment rate of adjacency-spectral partitioning is almost as bad as chance. 

Next we will consider the estimator for K proposed in Section|5j Recall that this estimator 
is defined as K = argmin^/jHC^' — X\\f < n^} = argmin^/{log n (||C j R'' — X\\f) < £} where 
Ck> is the n x R matrix of centroids associated with each vertex, the adjacency spectral 



clustering procedure in Section 2.2 is done with K' parts, and £ G (3/8,1/2) is fixed. We 
now consider stochastic block model parameters with stronger differences between blocks to 
illustrate the effectiveness of the estimator. In particular we let 



(10) 



so that rankM = 3. For each n = 100, 200, 400, 800, 1600, 3200, 6400 we generated 50 Monte 
Carlo replications of this stochastic block model. To each of these 50 realizations we per- 
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Figure 2: Test statistic for estimating K using the parameters in line (10) for R = 3, 6 and 
K' = 2, 3, 4. The unmarked dash line shows £ = 3/8. 

formed the adjacency spectral clustering procedure using R = 3 (Figure |2j left panel) and 
R = 6 (Figure |2j right panel) as our upper bound but this time assuming K is not known. 
We used K' = 2, 3, 4 and computed the statistic log n (||C/c' — Figure [2] shows the mean 

and standard deviation of this test statistic over the 50 Monte Carlo replicates for each R, 
K' and n. 

The results demonstrate that for n = 6400, K is a good estimate when R = 3 = rankM 
when we choose £ close to 3/8. On the other hand for smaller values of n, our estimator will 
select too few blocks regardless of the choice of £ G (3/8, 1/2). Interestingly, choosing £ close 
to 3/8, K always equals the true number of blocks when we let R = 6 = 2rankM, suggesting 
that this estimator has interesting behavior as a function of R. Note that for larger values 
of £, K will tend to be smaller, and for smaller values of £, K will tend to be larger. 



9 Discussion 

Our simulation experiment for estimating K demonstrates that good performance is possible 
for moderate n under certain parameter selections. This buttresses the theoretical and 
practical interest, as this estimator may serve as a stepping stone for the development of 
other more effective estimators. Indeed, bounds shown in [lj)j suggest that it may be possible 
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to allow £ to be as small as 1/4 using different proof methods. These methods in terms of 
the operator norm are an important area for further investigation when considering spectral 
techniques for inference on random graphs. 

Note additionally that for our first simulation, we used /c-means rather than minimizing 
\\C — X\\p since the latter is computationally unfeasible. This, together with fast methods 
to compute the singular value decomposition, indicates that this method can be used even 
on quite large graphs. For even larger graphs, there are also techniques to approximate the 
singular value decomposition that should be considered in future work. 



Further extensions of this work can be made in various directions. Rohe et al. [20] 

and others allow for the number of blocks to grow. We believe that this method could be 
extended to this scenario, though careful analysis is necessary to show that the estimator for 
the number of blocks is still consistent. 

Another avenue is the problem of missing data, in the form of missing edges; results for 
this setting follow immediately provided that the edges are missing uniformly at random. 
This is because the observed graph will still be a stochastic block model with the same block 
structure. Other forms of missing data are deserving of further study. Sparse graphs are 
also of interest and this work can likely be extended to the case of moderately sparse graphs, 
for example with minimum degree Q(n/y/logn), without significant additional machinery. 
Another form of missing data is that since we consider graphs with no self-loops, the diagonal 
of the adjacency matrix are all zeros. Marchette et al. [16] and Scheinerman and Tucker [22] 
both suggest methods to impute the diagonals, and this has been show to improve inference 
in practice. 

This is related to one final point to mention: Is it better to do spectral partitioning on 
the adjacency matrix (as we do here in this article) or on the Laplacian (to be used in place 
of the adjacency matrix in our procedure of this article)? There doesn't currently seem to be 
a clear answer; for some choices of stochastic block model parameters it seems empirically 
that the adjacency matrix gives fewer misassignments than the Laplacian, and for other 
choices of parameters the Laplacian seems to be better. A determination of exact criterion 
(on the stochastic block model parameters) for which the adjacency matrix is better than 
the Laplacian and vice versa deserves attention in future work. But the analysis that we 
used here to reduce the required knowledge of the model parameters and to show robustness 
in the procedure will hopefully serve as an impetus to achieve formal results for spectral 
partitioning in the nonparametric setting for which the block model assumptions don't hold. 
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